Forecasting CO2 emissions from road fuel combustion using grey prediction models: A novel approach

This paper proposes an optimized wavelet transform Hausdorff multivariate grey model (OWTHGM(1,N)) that addresses some of the weaknesses of the conventional GM(1,N) model such as inaccurate prediction and poor stability. Three improvements have been made: First, all inputs are filtered using a wavelet transform; second, a new time response function is established using the Hausdorff derivative; and finally, the use of Rao's algorithm to optimise the model's parameters as well as the ξ-order accumulated value of the observation data described by the Hausdorff derivative. In order to demonstrate the effectiveness of OWTHGM(1,N), it is applied to predict CO2 emissions from road fuel combustion. The new model scores 1.27% MAPE and 79.983 RMSE, and is therefore more accurate than competing models. OWTHGM(1,N) could therefore serve a reliable forecasting tool and used to monitor the evolution of CO2 emissions in Cameroon. The forecast results also serve as a sound foundation for the formulation of energy consumption strategies and environmental policies. • Modification, extension and optimization of grey multivariate model is done. • The model is very generic can be applied to a wide variety of energy sectors. • OWTHGM(1,N) is a valid forecasting tool that can be used to track CO2 emissions.


a b s t r a c t
This paper proposes an optimized wavelet transform Hausdorff multivariate grey model (OWTHGM(1,N)) that addresses some of the weaknesses of the conventional GM(1,N) model such as inaccurate prediction and poor stability. Three improvements have been made: First, all inputs are filtered using a wavelet transform; second, a new time response function is established using the Hausdorff derivative; and finally, the use of Rao's algorithm to optimise the model's parameters as well as the -order accumulated value of the observation data described by the Hausdorff derivative. In order to demonstrate the effectiveness of OWTHGM(1,N), it is applied to predict CO 2 emissions from road fuel combustion. The new model scores 1.27% MAPE and 79.983 RMSE, and is therefore more accurate than competing models. OWTHGM(1,N) could therefore serve a reliable forecasting tool and used to monitor the evolution of CO 2 emissions in Cameroon. The forecast results also serve as a sound foundation for the formulation of energy consumption strategies and environmental policies.
• Modification, extension and optimization of grey multivariate model is done.
• The model is very generic can be applied to a wide variety of energy sectors.
• OWTHGM(1,N) is a valid forecasting tool that can be used to track CO 2 emissions. Specifications

Introduction
When predicting the evolution of a system and there is very little information, a major difficulty arises, which is that of accurately extracting the system's characteristic without making too strong assumptions. Deng's grey theory offers a way out [ 1 , 2 ]. However, when conventional multivariate grey models (GM(1,N)) are used, the results are very often inconsistent with reality, as Tien [3] pointed out. One way out is to optimise GM parameters with heuristic optimization procedures [ 4 , 5 ]. In some real-world scenarios, these methods have enhanced the GM(1,N) model's prediction capabilities. Despite these efforts, there is still room to improve its flexibility and forecasting performance, which is summarized as follows: • As demonstrated by Tien [3] , the conventional GM(1,N) model is a factor-based model that reflects the behavioural effects of the driving variables on the system. Unfortunately, the fact that information contained in the set of driving variables is incomplete renders the conventional GM(1,N) model, and even some improved versions, unsuitable for forecasting [ 6 , 7 ]. • The input variables might have irrelevant information (called noise) that corrupts the prediction outcomes [8] .
• The least squares method is used to solve grey's difference equation in order to estimate the GM(1,N) model's parameters. However, the first-order differential equation resulting from greys model is used to create the time response function. Although the differential equation and the difference equation approximate each other, they are fundamentally distinct [9] . The GM(1,N) model may become unstable as a result of this discrepancy between parameter estimation and parameter application.
Thus, in order to address the above deficiencies, this paper puts forward a new grey multivariate forecasting model based on the Hausdorff's derivative [10] and optimized by Rao's algorithm (abbreviated OWTHGM(1,N)). In order to demonstrate the efficiency and reliability of OWTHGM(1,N), a practical case is considered, which is the forecasting of the annual electricity consumption.

Step 2: Generate accumulated data with 1-AGO
The first-order accumulation generating operation (1-AGO) of the observation data can be defined by Eqs. (1) and (2) : where, The superscripts (0) and (1) respectively indicate the original sequences and 1-AGO sequences.

Step 3: Design mean sequences
The following definitions ( Eq. (3) ) represent the mean sequences produced by consecutive terms of (1) : where: Step 4: Establish grey's differential equation The basic GM(1,N) model's differential equation can be written as in Eq. (4) : ( 1 ) Consequently, if the matrix is invertible, the parameters [ 1 , 2 , … , , … , +1 ] can be calculated using the least-squares approach ( Eq. (5) ). Which is: where the matrices and are given by Eqs. (5.1) and ( 5.2 ) respectively, Step 5: Solve the system's differential equation The initial condition can be used to find the solution to Eq. (5) : The fitting value of (1) is obtained by substituting into Eqs. (4) and (6) . Eq. (7) can be used to determine the fitting value for the original sequence (0) 1 .

Data filtration using wavelet transform
If there is noise or useless information in the raw data, this will corrupt the GM(1,N) model resulting in poor forecasts [8] . To prevent this, data is cleaned using a Wavelet transform (WT) which is a mathematical function that filters various scale components from a continuous-time signal. Basically, WT is a band-pass filter with its bandwidth scaled to half at each level [13] . The scaling function filters out the lowest point of the transform and allows the entire spectrum to be taken into account. Eq. (8) defines a continuous wavelet transform (CWT) of a signal ( ) : where the scale and translation parameters, are denoted by and ( , ∈ ℝ ) respectively. The discrete wavelet transform (DWT) of a signal is calculated using Eq. (9) : where = 1 , 2 , … and are the sampling time and scale factor respectively. is the number of samples. The most crucial element of the signal is its low order component. The signal's identity is clearified in this component. The signal's high order component, on the other hand, is a representation of the signal's specifics. The first step in filtering data is to determine the wavelet to be applied. This depends on the nature of the raw sample data (i.e. the signal). Although there are several criteria for choosing the appropriate wavelet [14] , a simple way to do this is to perform a correlation analysis. The waveform of the signal is examined along with the shape of a wavelet (DWT or CWT). If the two match, that particular wavelet is be used as the mother wavelet for the signal. Thus, the choice will vary from one sample data to another.
Conceptually, it operates as follows: In order to produce lowpass (A1, generally known as the approximation level) and highpass (D1, also known as the detail level) subbands from a signal X, the signal is first filtered with specialized lowpass and high pass filters (see Fig. A1 , in Appendix A ). After filtering according to the Nyquist criterion 1 [15] , half of the samples are deleted. The filters often produce strong computational performance and have a limited number of coefficients.
These filters can help eliminate any aliasing brought on by down sampling when reconstructing the subbands. The lowpass subband (A1) is iteratively filtered by the same method to produce narrower subbands (A2, D2, and so on) for the following level of decomposition. Each subband's length of coefficients is divided by the total number of coefficients in the stage before it. In this way, the signal of interest can be captured by a few DWT coefficients of great size, and the signal noise is represented by smaller DWT coefficients. In this manner, DWT aids in the analysis of signals at various resolutions and narrower subbands. Additionally, it aids in signal compression and denoise.
Practically, the overall process of data filtration with DWT is done in Matlab (or other programming languages) based on the following three steps: Step 1: Obtain the approximation and detail coefficients.
To do this, a multilevel wavelet decomposition is used. For a fine scale study, the approximation subband is broken down at several levels or scales (as in Fig. A1 ).
Step 2: analyse the details and identify a suitable thresholding technique. This is done with Matlab (or other programming languages). Hard thresholding and soft thresholding are the two thresholding operations. The coefficients with magnitudes below the threshold in either operation are set to zero. The way these two methods handle coefficients with magnitudes greater than the threshold is what distinguishes them from one another. In the case of soft thresholding, the coefficients larger than the threshold are kept unaltered, whereas in the case of hard thresholding, they are decreased towards zero by subtracting the threshold value from the coefficient value. In this paper, we used the sure shrink with soft thresholding technique to denoise the data. 2 A single function is used to perform the thresholding of the coefficients as well as the reconstruction of the signal using the new coefficients (see the Matlab code in Appendix B for further information). In this code, sure shrink is the thresholding approach. The first parameter 'f' refers to the noisy signal. 'S' stands for soft thresholding, and the parameter 'sln' represents threshold rescaling with a single noise estimate based on first level coefficients. Level indicates the wavelet decomposition level and the last parameter specifies the wavelet, which is sym6 in this case. The wden function decomposes the input signal into many levels, computes and applies a threshold to the detail coefficients, reconstructs the signal using the updated detail coefficients, and outputs it.
Step 3: Threshold the detail coefficients and reconstruct the signal.
To begin, use the wavedec function to conduct a multilevel wavelet decomposition. The noisy signals are decomposed to five levels. Along with the detail coefficients from the first to the last levels, the function also outputs the fifth level approximation coefficients. The signal's high frequencies are captured by the first level of detail coefficients. The noise in the signal makes up the majority of the high-frequency content, but abrupt signal shifts make up a portion of the high frequency. 3 The details subband deserves a careful examination. The detcoef function is used to extract the coefficients, and the coefficients for each level are plotted. With increasing scale/level, the noise is significantly reduced. If we pay attention to level 1 specifics. The objective in this case is to keep abrupt transitions while removing noise. In order to accomplish this, a threshold is applied to the detail coefficients. The universal threshold is the simplest to compute and is computed using Eq. (10) : where is the signal and D is set of first level detail coefficients.

The novel Hausdorff GM(1,N) model
Step 1: Determine the -order accumulation The following introduces the idea of the fractal derivative of a function Ψ(t) with regard to a fractal measure [10] : Eq. (11) is also known as the Hausdorff derivative. is the fractal time with scale index . If for a given function Ψ(t) both its derivative Ψ(t) and its fractal derivative Ψ(t) exists, one can find an analogue to the chain rule: Thus, from Eq. (11.2) , it is possible to deduce the -order accumulated value of the observation data which then allows to define a new raw data set represented by ( ) in Eqs. (12) and (13) : Step 2: Establish grey's differential equation The proposed model's differential equation has the following form ( Eq. (14) ): Soft thresholding is always a good starting point if in doubt about which technique to choose. 3 There are times when these abrupt changes carry meaning, and it might be desirable to retain this information while denoising the signal.
Step 3: Solve the system's differential equation Taking integrals over the interval [ − 1 , ] on both sides of the above equation, we get Eq. (15) : We know that, and from Eq. (13) .., Therefore, Eq. (15) is as follows: Then, Eq. (15) If det ( ) ≠ 0 , then matrix can be estimated with Eq. (17.5) : The solution to Eq. (14) is given by Eq. (18) : where the function ( ) is expressed as in Eq. (19) , Step 4: Generate forecast values Appendix B provides a demonstration of how Eq. (18) is obtained. The following formula can be used to find the fitted value of the original sequence (0) 1 : Eq. (20) allows to calculate the forecast values. It is known as the wavelet transform Hausdorff grey multivariate model (abbreviated WTHGM(1,N)).

Optimizing wavelet transform Hausdorff GM(1,N) parameters
Eq. (18) can be expressed using the modified trapezoidal integral formula (given by Eq. (21) ): Still with modified trapezoidal integral formula, Eq. (17.3) can be rewritten as in Eq. (22) : An optimization procedure with the mean absolute percentage error (MAPE) set as the objective function can be used to find and . Eq. (23) gives a definition of MAPE.
where = (0) 1 ( ) − ̂ (0) 1 ( ) , and is the data size. The minimum MAPE value is then determined using a meta-heuristic approach. The ideal values of and are chosen so as to minimize the difference between the predicted CO 2 and the emission over a of years. By solving the following optimization problem ( Eqs. (24.1) and ( 24.2 )), the optimal values of and are ascertained for this purpose: , the value of which will be chosen according to the precision that we wish to achieve.

Selection of meta-heuristic algorithm
Other meta-heuristic algorithms such as genetic algorithms (GA), ABC and particle swarm optimization (PSO) need parameters that are algorithm-specific in addition to usual parameters like iterations and population size. For instance, while PSO needs inertia weight and elements related to social and cognitive learning, GA have special operators such as selection, mutation probability and crossover probability, whereas ABC requires number of employed bees, onlooker bees, scout bees and limits, etc.
Incorrectly setting algorithm-specific parameters can lead to undesired results, such as increasing convergence time or falling into local optimization. However, tuning the specific parameters is a very laborious process. The optimal specific parameters may vary from model to model and may vary from dataset to dataset. The selection of optimal parameters itself is an optimization problem.
Rao's algorithms, which have been developed recently, are straightforward and relatively simple to implement as they do not rely on the use of any algorithm-specific parameters. In particular, the improved Rao algorithm [16] is an algorithm that reinforces both exploitation and exploration, and has a fast convergence speed and a strong ability to deviate from local optimization.

Improved Rao algorithm for the WTHGM(1,N) model
Here, the population consists of individuals = ( , 1 , … , +1 ) ∈ ℝ +2 . Initially, each individual is randomly generated while satisfying the constraints of Eq. (24.2) . Population updates are carried out on local exploitation phase or global exploration phase with a probability of 0.5 to enhance both exploitation and exploration abilities.
In local exploitation phase, is sorted in incremental order of MAPE values. The first half population with the smallest MAPE values is considered as the best individuals while the worst individuals is the second half. Following that, Eqs. (25) and (26) are used to compute the local best-mean ( ̄ ) and the local worst-mean ( ̄ ) vectors respectively: Here 1 and 2 are uniformly distributed random vectors in the range [ 0 , 1 ] +2 . and are the best and worst individuals in respectively. ̄ and ̄ are the mean vectors of the best and worst population respectively. The new individual ′ is generated with Eq. (27) : Here 3 and 4 are uniformly distributed random vectors in the range [ 0 , 1 ] +2 . is the randomly selected other individual in . If is better than , ( or ) means , otherwise . In local exploration phase, the global population is considered. In the first iteration, is set as = and from the second iteration, the new is the old or the updated with a probability of 0.5. Then the individuals in are shuffled within itself. The new individual ′ is generated with Eq. (28) : Here and are the -th individual in and respectively. is the normally distributed random vector in the range [ 0 , 1 ] +2 .
After generating new individual ′ , it is compared with the old . If ′ is better than , is replaced with ′ in . This process is repeated until the MAPE of the best individual is smaller than the criterion (

Data selection, filtering and performance measurement
In order to verify the performance of the OWTHGM(1,N) model, we implemented it to forecast CO 2 emissions from Cameroon's road fuel combustion. In the OWTHGM(1,N) model implemented in this simulation, the filtering technique applied is DWT (i.e. Eq. (9) ). This choice follows from the approach explained at the end of the section entitled «Data filtration using wavelet transform».
The selected drivers are: CO 2 emissions, urban population (UP), GDP, road fuel consumption (RFC), fuel prices (PR) and vehicle fleet (VF) [ 17 , 18 ]. Data used in this study cover the period from 1995 to 2020. A prerequisite for good modelling is the selection of appropriate variables, i.e. those that have a significant influence on CO 2 emissions. So the first step is to ensure that these variables are highly correlated with CO 2 emissions before including them as independent variables in the forecasting model.
Correlation analysis (presented in Table 1 ), shows that CO 2 emissions are significantly correlated with selected variables. Therefore, it can be concluded that they can be used as inputs to model and forecast CO 2 emissions. However, it is better to filter the data with WT before the forecasting step. This precaution is crucial because the data may have noise that could introduce data matrix ill-conditioning problems in the estimation of grey model parameters.   Table 3 Optimum parameters obtained with data on CO 2 emissions from road transport in Cameroon.
where = (0) 1 ( ) − ̂ (0) 1 ( ) represents the th simulation error between actual CO 2 emissions (0) 1 ( ) and predicted outcomes ̂ (0) 1 ( ) . ℎ is the total number of predictions and ̄ (0) 1 ( ) is the mean of (0) 1 ( ) . Scores of MAPE and RMSE closest to zero indicate the best accuracy. Table 2 shows the threshold values for MAPE. Coming to FD, a score greater than 0 . 95 indicates a very high precision, and a score between 0 . 85 and 0 . 95 indicates a good forecasting accuracy. Overall, the more score is close to 1.0, the more accurate the forecasting model is. The forecasting model may overfit or underfit in case of data leakage. To ensure that this does not happen, the data set is divided into two (training and test sets). These two sets are concealed from each other so that there is no leakage. 4

Modelling CO 2 emissions
Prediction of CO 2 emissions from Cameroon's road fuel combustion are estimated with WTHGM(1,N) optimized by Rao's algorithm (denoted OWTHGM). For validation purposes, we also compare the estimates with GM(1,N) and WTHGM(1,N) models, as well as multilinear regression (MLR) as applied in the work of Karakurt and Aydin [21] . Training data spans from 1995 to 2017 while test data cover the period or this, the reference and new models are implemented using data collected over the period 1995-2017 to train them (i.e. parameterise the models), while data from 2018 to 2020 are used to test the forecasting performance of each model. In all, 24 simulations were performed on Matlab R2021a using a PC with 8.0 GB RAM. To go systematically, here is how to proceed:

Stage 1: Determining the filtering technique
We start by creating a copy of the raw data. One copy is filtered to remove any potential disturbances while the other copy is used as is. As mentioned above, we examined the shape of the wavelet and it matched with the DWT. Thus we apply Eq. (9) to denoise the input data. The next stage is modelling.

Stage 2: Modelling the WTHGM(1,N)
At this stage the unfiltered copy is used to establish the WTHGM(1,N) model. To do this, Eqs. (11) to (20) are applied.

Stage 3: Optimisation of the WTHGM(1,N)
Here, we use the copy of the raw data that has been filtered and build the OWTHGM(1,N) model given by Eqs. (11) to (28) (see pseudocode). With Rao's algorithm, the population size was fixed at 50 for 10 5 iterations. In the specific context of our data, we obtained the optimum parameters shown in Table 3 :

Stage 4: Generating all results
Finally, we generate all the forecast results given by the WTHGM(1,N) and OTHGM(1,N) models. From these results, we calculate the performance of both models to determine their MAPE, FD and RMSE. This allows us to see which model was more accurate.
As shown in Table 4 , it can be seen that with the training data (1995-2017), the conventional GM(1,N) model, WTHGM(1,N), its optimised version (OWTHGM), and MLR have similar performance in terms of MAPE. They all score a MAPE below 5%, which allows  concluding that they behave like class I models according to Table 2 . However, we note that the classical GM(1,N), WTHGM(1,N) and MLR models slightly outperform the new model in terms of RMSE, FD and MAPE. Fig. 2 a shows the fit curves of the new OWTHGM(1,N) model and the competing models. It is obvious that the OWTHGM(1,N) model (red curve) needs a short fitting time (represented by the first three predictions in the training phase) before it can efficiently extract the systems' information. This delay is confirmed by its FD, which is only 95% (see Table 4 ), the lowest of the four models. This explains why in Fig. 2 b the peak of APEs is observed on the column of OWTHGM(1,N) model, indicating a lower forecasting accuracy in the training phase. When data is hidden from each model to check whether they are able to predict CO 2 emissions over the period 2018-2020 (test phase), OWTHGM(1,N) model significantly outperforms the competing models. It appears that only MLR can cope with the novel model. It can be seen in Fig. 3 a how the predictive curve of OWTHGM(1,N) comes closest to real data. Moreover, FD criteria shows that OWTHGM(1,N) model has the best fit with a score of 99% while its MAPE and RMSE are the closest to zero with scores of 1.27% and 79.99 respectively (see Table 5 ). Fig. 3 b confirms this performance as the novel model actually has the smallest APE distribution.
The model proposed in this paper generates low forecasting errors (less than 5%), leading to the conclusion that OWTHGM(1,N) is a reliable forecasting tool just like its competitors. The novel model can forecast CO 2 emissions more precisely as well.
As for MLR model, the regression of (0) 1 (i.e. CO 2 emissions) on independent variables (i.e. (0) 2 , … , (0) ) reveals that GDP, UP and VF are the most significant determinants of CO 2 with contributions of 37.1%, 39.8% and 21.6% respectively. R-squared value is 0.97, indicating a strong relationship between CO 2 emissions and the independent variables. The implication is that the independent variables used in this study manage to explain 97% of Cameroon's road transport CO 2 emissions. Only 3% of variation remains unexplained, which is certainly due to insufficient data. However, even if more data were collected and other independent variables included, it would never be possible to explain all variations in CO 2 emissions [22] . Given the performance of OWTHGM (1,N), it can be concluded that the proposed model is better at predicting future CO 2 emissions without the need to know the underlying functional relationship between the different variables.  In order to strengthen the validation of the new model, we also applied it to other datasets, namely the annual CO 2 emissions from the Nigerian transport sector obtained from reference [23] . We also divided the data into two groups (training and test sets). The results from the training set are shown in Table 6 .
The results shown in Table 6 reveal that the OWTHGM(1,N) model outperforms all competing models on all criteria of comparison. For the forecasts made with the test data (see Table 7 ), the OWTHGM(1,N) model manages to obtain the best MAPE (1.74%) and therefore the best FD (0.953). When we come to the RMSE criterion, the WTHGM(1,N) model wins with a score of 1.58.
These results and the previous ones allow us to conclude that the new OWTHGM(1,N) model is capable of making very accurate forecasts.
Forecasting Cameroon's road transport CO 2 emissions CO 2 emissions from the Cameroonian road transport sector over the period 2021-2030 are predicted using the new model proposed in this study and its competitors. In this respect, we used the projected values of the independent variables (RFC, PR, GDP, UP and VF) by a simple regression. These values are confirmed by forecasts of the National Development Strategy [24] . Forecasts results show with great certainty that substantial increases in total CO 2 emissions from the Cameroonian road transport sector are to be expected in the coming years. In other words, CO 2 emissions from road transports will increase from 3779 kt in 2020 to 4600 kt in 2030 (see Table 8 ). Thus, current CO 2 emissions will increase by 120% in less than ten years. These results are in contrast to the Cameroon government's projections which aim to reduce total CO 2 emissions by 32% before 2030 [18] .   Table 8 Projection results of road fuel CO 2 emissions in Cameroun.

Conclusions
This paper proposes an optimized wavelet transform Hausdorff grey multivariate forecasting model (abbreviated OWTHGM (1,N)). In order to validate the model, we implemented it to forecast CO 2 emissions from road fuel combustion. For this purpose, we started by showing that GDP, urban population, road transport fuel consumption, fuel prices and size of vehicle fleet could be used as determinants of CO 2 emissions. Forecasts results are compared with the classical GM(1,N), WTHGM(1,N) and MLR, allowing to conclude that: • GDP, size of vehicle fleet and urban population are the most significant determinants of CO 2 emissions as confirmed by Refs. [ 17 , 18 , 25 ]. With these determinants, the proposed OWTHGM(1,N) model manages to completely extract the existing connections they have with CO 2 emissions.
• Forecasts of CO 2 emissions produced with OWTHGM(1,N) are more precise than competing models. Thus OWTHGM(1,N) is a reliable tool for monitoring CO 2 emissions from Cameroon's road transport sector.
These achievements stem from three improvements made in the classical GM(1,N) model. Initially, only significant drivers are considered. Then, all selected drivers are filtered using WT, thereby demonising all series that could hamper modelling. More so, a new time response function has been established using the Hausdorff derivative. Lastly, Rao's algorithm is used to optimise all parameters. The OWTHGM(1,N) model therefore proves to be a reliable forecasting tool and can be used to monitor the evolution of CO 2 emissions but can also be applied in other forecasting fields characterised by insufficient information.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data availability
Data will be made available on request.